Disallow new requests after ≥10 and ≥1% undocumented error responses#1
Closed
Disallow new requests after ≥10 and ≥1% undocumented error responses#1
Conversation
changelog: cover 0.2.1
* n_results is renamed to n_success; * n_extracted_queries is removed, because it's always the same as n_results (i.e. n_success); * n_input_queries is removed: it wasn't really a number of input queries, (it was a number of processed queries), and it can be computed from other stats: success + fatal errors; * added a short comment which explains each stat value
clean up AggStats
Zyte Data API → Zyte API
add brotli as a dependency
Prepare for 0.4.0 release
Network error retry time: 5 minutes → 15 minutes
It seems aiohttp has troubles with edge cases of Keep-Alive, and disabling it helps with ServerDisconnectedErrors. Using aiohttp sessions is still important, because it allows to reduce the number of ClientConnectorErrors.
Don't reuse the connections.
Add conservative_retrying
Add Python 3.12 and 3.13, drop Python 3.8, update tool versions
Gallaecio
commented
Jan 3, 2025
Comment on lines
+75
to
+76
| if store_errors and isinstance(e, RequestError): | ||
| write_output(e.parsed.data) |
Owner
Author
There was a problem hiding this comment.
This was an error detected while addressing typing issues, and removing the request error mock from tests. write_output does json.dumps, so when passing a string here before, it was printing in the file '{"… instead of the expected {"….
Gallaecio
commented
Jan 3, 2025
| raise | ||
|
|
||
| logger.error(str(e)) | ||
| logger.exception("Exception raised during response handling") |
Owner
Author
There was a problem hiding this comment.
I found it useful to see the exception traceback when the exception is not RequestError. Although it might make things too verbose for RequestError. Maybe we should only log the traceback when the exception is unknown?
There was a problem hiding this comment.
This sounds like something that can/should be re-evaluated after using it.
wRAR
reviewed
Jan 10, 2025
wRAR
approved these changes
Jan 10, 2025
Open
1 task
Owner
Author
|
Continued at zytedata#82 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cc @kmike @wRAR
To be merged into zytedata#78, or if that’s merged first, to be moved (the PR) to the upstream repo with
mainas the target branch.It’s a rather hacky approach, to be honest:
The number of total responses and undocumented error responses is tracked in class variables. Tests involve monkey patching to reset or hard-code their values.
If we want to keep the implementation at tenacity level, this is the cleanest path I saw, although I don’t discard there being better ways.
The use of the class to count, instead of an instance, is due to tenacity creating copies of the instance for each function wrapping. The specifics are not completely clear to me, I am not 100% sure there is no way around it, but this line seems problematic for instance-based storage of these counters. The statistics on the next line are also not global, and get reset on every wrapped call.
We could consider implementing this logic outside tenacity, in the client code. It would allow for a cleaner implementation, e.g. based on stats. The main issue I see, and it might be considered minor, is that then we cannot stop new requests as soon as the condition is met, we still need to wait for the retries of 1 of the on-going requests to finish, which means we may sometimes stop new requests only after e.g. 11+ and not only 10 undocumented error responses. There is also the issue of not being able to customize this logic through a custom retry policy class, but we might not want to support that to begin with, and if we did, we could instead provide some client parameter(s) instead.
I made some other decisions I’m not entirely sure about:
Post-merge work:
TooManyUndocumentedErrorsexception.